On Accelerating the Nonsymmetric Eigenvalue Problem in Multicore Architectures
Scientific computing relies on state of the art algorithms in software libraries and packages. Achieving high performance in today's computing landscape is a challenging task as modern computer architectures in High-Performance Computing (HPC) are increasingly incorporating multicore processors and special purpose hardware, such as graphics processing units (GPUs). The emergence of this heterogeneous environment necessitates the development of algorithms and software packages that can fully exploit this structure. Maximizing the amount of parallelism within an algorithm is an essential step towards high performance. One of the fundamental computations in numerical linear algebra is the computation of eigenvalues and eigenvectors. Matrix eigenvalue problems can be found in many applications. The algorithm of choice depends on the nature of the matrix. For sparse matrices, there are several viable approaches, but, for dense nonsymmetric matrices, one algorithm remains the method of choice. This is the QR algorithm. Unfortunately, on modern computer platforms, there are limitations to this approach in terms of parallelism and data movement. Current implementations do not scale well or do not take full advantage of the heterogeneous computing environment. The goal of this work is to examine nonsymmetric matrix eigenvalue problems in the context of HPC. In Chapter 2, we examine tile algorithms and the implementation of block Arnoldi expansion in the context of multicore. Pseudocodes and implementation details are provided along with performance results. In Chapter 3, we examine various algorithms in the context of computing a partial Schur factorization for nonsymmetric matrices. We examine several iterative approaches and present implementations of specific methods. The methods studied include a block version of explicitly restarted Arnoldi with deflation, a block extension of Stewart's Krylov-Schur method, and a block version of Jacobi-Davidson. We present a new formulation of block Krylov-Schur that is robust and achieves improved performance for eigenvalue problems with sparse matrices. We experiment with dense matrices as well. We expand on our work and present a C code for our block Krylov-Schur approach using LAPACK routines in Chapter 4. The code is robust and represents a first step towards an optimized version. Our code can use any desired block size and compute any number of desired eigenvalues. Detailed pseudocodes are provided along with a theoretical analysis.
Book contributor Auraria Library
Collection auraria; additional_collections